Tagging with Combined Language Models and Large Tagsets

نویسنده

Dan Tufiş

چکیده

The paper discusses experiments, results, applications and further developments in tagging a highly inflectional language, based on multiple register diversified language models. The texts are accurately disambiguated in terms of a large tagset (611 tags) in two linear-time processing steps (tiered processing). The underlying tagger simultaneously uses multiple register language models and choosing the final annotation is achieved by a combined classifiers decisionmaking procedure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The paper describes a general method (as well as its implementation and evaluation) for deriving the mapping rules for the dif

The paper describes a general method (as well as its implementation and evaluation) for deriving mapping models for different tagsets available in existing training corpora (gold standards) for a specific language. These mapping models are further used to significantly improve the accuracy in the underlying training corpora and also for the assessment of the distributional adequacy of various t...

متن کامل

High Accuracy Tagging with Large Tagsets

The paper presents experiments and results related to morpho-syntactic (MS) tagging of a highly inflectional language, based on combining language models (LM) learnt from multiple register-diversified corpora. To cope with a large tagset (614 tags), our underlying tagger uses a hidden smaller tagset (92 tags), mapped back, after the proper tagging, into the initial tagset. The same text is tagg...

متن کامل

Using a Large Set of EAGLES-compliant Morpho-Syntactic Descriptors as a Tagset for Probabilistic Tagging

The paper presents one way of reconciling data sparseness with the requirement of high accuracy tagging in terms of fine-grained tagsets. For lexicon encoding, EAGLES elaborated a set of recommendations aimed at covering multilingual requirements and therefore resulted in a large number of features and possible values. Such an encoding, used for tagging purposes, would lead to very large tagset...

متن کامل

Tagset Mapping and Statistical Training Data Cleaning-up

The paper describes a general method (as well as its implementation and evaluation) for deriving mapping systems for different tagsets available in existing training corpora (gold standards) for a specific language. For each pair of corpora (tagged with different tagsets), one such mapping system is derived. This mapping system is then used to improve the tagging of each of the two corpora with...

متن کامل

Tiered Tagging Revisited

In this paper we describe a new baseline tagset induction algorithm, which unlike the one described in previous work is fully automatic and produces tagsets with better performance than before. The algorithm is an information lossless transformation of the MULTEXTEAST compliant lexical tags into a reduced tagset that can be mapped back on the lexicon tagset fully deterministic. From the baselin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Tagging with Combined Language Models and Large Tagsets

نویسنده

چکیده

منابع مشابه

The paper describes a general method (as well as its implementation and evaluation) for deriving the mapping rules for the dif

High Accuracy Tagging with Large Tagsets

Using a Large Set of EAGLES-compliant Morpho-Syntactic Descriptors as a Tagset for Probabilistic Tagging

Tagset Mapping and Statistical Training Data Cleaning-up

Tiered Tagging Revisited

عنوان ژورنال:

اشتراک گذاری